On the usefulness of Meyer wavelets for 
deconvolution and density estimation 



Jeremie Bigot 
March 2009 

Abstract 

The aim of this paper is to show the usefulness of Meyer wavelets for the classical problem of 
density estimation and for density deconvolution from noisy observations. By using such wavelets, the 
computation of the empirical wavelet coefficients relies on the fast Fourier transform and on the fact 
that Meyer wavelets are band-limited functions. This makes such estimators very simple to compute 
and this avoids the problem of evaluating wavelets at non-dyadic points which is the main drawback 
of classical wavelet-based density estimators. Our approach is based on term-by-term thresholding of 
the empirical wavelet coefficients with random thresholds depending on an estimation of the variance 
of each coefficient. Such estimators are shown to achieve the same performances of an oracle estimator 
up to a logarithmic term. These estimators also achieve near-minimax rates of convergence over a large 
class of Besov spaces. A simulation study is proposed to show the good finite sample performances of 
the estimator for both problems of direct density estimation and density deconvolution. 
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1 Introduction 

Density estimation is a well-known problem in statistics that has been thoroughly studied. It consists in 
estimating an unknown probability density function f from an independent and identically distributed 
(iid) sample of random variables X, for i = \,...,n, with n representing the sample size. Wavelet 
decomposition is known to be a powerful method for nonparametric estimation, see e.g. Donoho, 
Johnstone, Kerkyacharian and Picard (1995). The advantages of wavelet methods is their ability in 
estimating spatially inhomogeneous functions. They can be used to estimate functions in Besov spaces 
with optimal rates of convergence, and have therefore received special attention in the literature over 
the last two decades, in particular for density estimation, see e.g. Donoho, Johnstone, Kerkyacharian 
and Picard (1996) and Vidakovic (1999) for a detailed review on the subject. 



For a given scaling function <p and mother wavelet ip, the scaling and wavelet coefficients are usually 
estimated as Donoho et al (1996) 




(1.1) 



where tpj ^{x) = <p(2i°x — k), ^pj^{x) = ip(2>x — k), and jo denotes the usual coarse level of resolution. A 
hard-thresholding estimator of / then takes the form 



where j\ is an appropriate frequency cut-off, and where the Xt ^'s are appropriate thresholds (positive 
numbers) that are possibly level-dependent. However, in practice the computation of the coefficients 
(1.1) requires some numerical approximation by interpolation as one typically only knows the values 
of the functions <fy jt and tpj^ at dyadic points. Various approaches have been used to approximate 
numerically the coefficients in (1.1). For instance Koo and Kooperberg (2000), Antoniadis, Gregoire and 
Nason (1999) use a binning method followed by the the standard discrete wavelet transform, while the 
algorithm proposed in Herrick, Nason and Silverman (2001) is based on an approximation of the scaling 
coefficients at a sufficiently small level of resolution. 

In this paper, we propose to avoid the use of such numerical approximation schemes. This is 
achieved by using Meyer wavelets which are band-limited functions. Indeed, using such wavelets and 
thanks to the Plancherel's identity, the empirical coefficients can be easily computed from the Fourier 
transform of the data X,, i = 1, . . . , n. Such an approach therefore takes full advantages of the fast Fourier 
transform and of existing fast algorithms for Meyer wavelet decomposition developed by Kolaczyk 



As Meyer wavelets are band-limited functions, they have been recently used for deconvolution 
problems in nonparametric regression by Johnstone, Kerkyacharian, Picard and Raimondo (2004), 
Pensky and Sapatinas (2008), and for density deconvolution by Pensky and Vidakovic (1999), Fan and 
Koo (2002), Bigot and Van Bellegem (2009). Moreover, the use of such wavelets leads to fast algorithms, 
see Kolaczyk (1994) and Raimondo and Stewart (2007). 

Density deconvolution is the problem of estimating the function / when the observation of the 
random variable X is contaminated by an independent additive noise. In this case, the observations 
at hand are a sample of variables Y,, i — 1, . . . , n such that 



where X; are iid variables with unknown density /, and e\ are iid variables with known density h which 
represents some additive noise independent of the X,'s. Density estimation from a noisy sample is of 
fundamental importance in many practical situations, and applications can be found in communication 
theory Masry (2003), experimental physics (e.g. Kosarev et al, 2003) or econometrics Postel-Vinay and 
Robin (2002). In this setting, the density of the observed variables Y; is the convolution of the density 
/ with the density h of the additive noise. Hence, the problem of estimating / relates to nonparametric 
methods of deconvolution which is a widely studied inverse problem in statistics and signal processing. 
However the indirect observation of the data leads to different optimality properties, for instance 
in terms of rate of convergence, than the direct problem of density estimation without an additive 
error. Standard techniques recently studied for density deconvolution include model selection Comte, 
Rozenholc and Taupin (2006b), kernel smoothing Carroll and Hall (1988), spline deconvolution Koo 
(1999), spectral cut-off Johannes (2008) and wavelet thresholding Pensky and Vidakovic (1999), Fan and 



k j=j k 



(1.2) 



(1994). 



Y; = X,- + £j, i = l,...,n, 



(1.3) 
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Koo (2002), Bigot and Van Bellegem (2009), to name but a few. Again, Meyer wavelets can be used to 
easily compute estimators of the wavelet coefficients of / by using the Fourier coefficients of the noise 
density h without using any numerical approximation scheme. 

The second contribution of this paper is the use of random thresholds T^. Classically, the thresholds 
used in wavelet density estimation are deterministic, but such thresholds may be too large in practice 
as they do not take into account the fact that the variance of each empirical wavelet coefficient pi % 
depends on its location (j, k) . The use of random thresholds has been recently proposed in Juditsky 
and Lambert-Lacroix (2004) in the context of density estimation, and in Reynaud-Bouret and Rivoirard 
(2008) for Poisson intensity estimation. However, the estimation procedure in Juditsky and Lambert- 
Lacroix (2004) and Reynaud-Bouret and Rivoirard (2008) is different from the one we propose, since it 
is based on biorthogonal bases and on the use of the Haar basis to compute the wavelet coefficients. 
The use of data based threshold exploiting the variance structure of the empirical wavelet coefficients 
is also proposed in Herrick et al (2001), but the theoretical properties of the resulting algorithms are not 
studied. In this paper, we show that using Meyer wavelets allows one to compute easily an estimation 
of an upper bound of the variance of the pj^'s that is then used to compute random thresholds T; 

Then, a third contribution of this paper is that the resulting hard-thresholding estimators are shown 
to attain the same performances (up to logarithmic terms) of an ideal estimator, called oracle as its 
computation depends on unknown quantities such as the variance of the fij^'s or the magnitude of the 
true wavelet coefficients. Oracle inequalities is an active research area in nonparametric statistics (see 
e.g. Johnstone (2002), Candes (2005) for detailed expositions) which has recently gained popularity. 
Deriving an oracle inequality is the problem of bounding the risk of a statistical procedure by the 
performances of an ideal estimator which represents the best model for the function to recover. Oracle 
inequalities are currently used in many different contexts in statistics. They have been introduced 
in Donoho and Johnstone (1994) for nonparametric regression with wavelets, then used by Cavalier, 
Golubev, Picard and Tsybakov (2002) for inverse problems in the white noise model, by Rigollet (2006), 
Castellan (2003), Efromovich (2008), Bunea, Tsybakov and Wegkamp (2007) for density estimation 
problems, and by Reynaud-Bouret and Rivoirard (2008) for estimating the intensity of a Poisson process, 
to name but a few. 

The rest of this paper is organized as follows. In Section 2, we provide some background on Meyer 
wavelets and we define the corresponding wavelet estimators with random thresholds for both direct 
density estimation and density deconvolution. In Section 3, it is explained how the performances of 
such estimates can be compared to those of an oracle estimate, which leads to non-asymptotic oracle 
inequalities. Asymptotic properties of the estimators are then studied in Section 4. Depending on the 
problem at hand., these estimators are shown to achieve near-minimax rates of convergence over a large 
class of Besov spaces. Finally, a simulation study is proposed in Section 5 to evaluate the numerical 
performances of our estimators and to compare them with other procedures existing in the literature. 
The proofs of the main theorems are gathered in a technical Appendix. 

2 Density estimation with Meyer wavelets 

In what follows, it will be assumed that the density function / of the X;'s has a compact support included 
in [0, 1] . Of course, assuming that the support of / is included in [0, 1] would not hold in many practical 
applications and this is mainly made for mathematical convenience to simplify the presentation of the 
estimator. If the range of the data is outside [0,1], one can simply rescale and center them such that they 
fall into [0,1], and then apply the inverse transformation to the estimated density. 
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2.1 Wavelet decomposition and the periodized Meyer wavelet basis 

Let (<p*, tp* ) be the Meyer scaling and wavelet function respectively (see Meyer (1992) for further details). 
It is constructed from a scaling function <p* with Fourier transform 

4>(co) = / <b*(x)e 1C0X dx = < V2 1 1 ^ 

Jr | if ^1 >47r /3 ; 

where /; : C — »• R is a smooth function chosen as a polynomial of degree 3 in our simulations. Scaling 
and wavelet function at scale / > are defined by 

f jik {x) = 2' /2 (p* (2? x - k) and ipj^x) = 2> /2 ip* (2'x - k), k = 0, . . . ,2> - 1. 

As in Johnstone et al (2004) , one can then define the periodized Meyer wavelet basis of L 2 ([0, 1]) (the 
space of squared integrable functions on [0, 1]) by periodizing the functions ($*, xp*) i.e. 

<p jM (x) = 2i /2 £ cp*(2'(x + i) - k) and % k {x) = 2' /2 £ ip*(2'(x + i) - k), k = 0,. . .,2> - 1. 

iez ieZ 

For any function / of L 2 ( [0, 1] ), its wavelet decomposition can be written as: 

2/0-1 +00 27-1 

/ = E Cjorfjo* + E E W}*> 
fc=0 ;=;'o fc=0 

where c ; - Q/fc = {f,<pj jc) = f f(u)(p jork (u)du, fa = (f,ipj ik ) = Jq f(u),ip jik {u)du and ; > denotes the 
usual coarse level of resolution. Moreover, the L 2 norm of / is given by 

2^0-1 +002/-I 

!l/!| 2 = E c l,k+ E E Pjjr 
fc=0 ;=/ fc=0 

Meyer wavelets can be used to efficiently compute the coefficients C; ik and fij /k by using the Fourier 
transform. Indeed, let e^(x) = e 2ni ^ x , I £ Z and denote by = (f,e^) = f(u)e~ 2me "du the Fourier 
coefficients of / supposed to be a function in L 2 ( [0, 1] ) . By the Plancherel's identity, we obtain that 

where t^' = (xp; ik ,e() denote the Fourier coefficients of tpj^ and C; = {£ 6 Z;t^' 7^ 0}. As Meyer 
wavelets t/^ are band-limited C; is a finite subset set of [— 2- ,+2 co, — 2^Co] U [2/co,2/ +2 co] with Co = 2zr/3 
(see Johnstone et al (2004)). 

2.2 The case of direct density estimation 

Based on a sample X\, . . . , X n , an unbiased estimator of ff is given by jj Y^ n= \ exp(—2rci£X m ) which 
yields an unbiased estimator of /3; k given by 

eeCj \ n m=i / 

The coefficients (2.2) can therefore be easily calculated by combining the fast Fourier transformation of 
the data with the fast algorithm of Kolaczyk (1994) which relies on the fact that the first sum in (2.2) 
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only involves a finite number of terms. Equation (2.2) also shows that using Meyer wavelets, which 
are band-limited functions, avoids the use of numerical schemes to approximate the computation of 
the coefficients (1.1) as one typically only knows the values of the functions <pi 0l k and ipj^ at dyadic 
points. We define the estimators of the scaling coefficients c- jQ ^ analogously, with <p instead of ip and 

C jo = {£e "Z;<pf k ^ 0} instead of Cy. 

2.3 The case of density deconvolution 

Consider now the problem (1.3) of density deconvolution. Denote by fy = (f, eg) the Fourier coefficients 
of f, and by h( = J R h(u)e^{u)du the Fourier coefficients of the error density h. Since f e = E(e~ 27liex ^) 
and hp = E(e _27ri ^ ei ), it follows by independence that E(e~ 27n£Yl ) = fghf. This equality and the 
Plancherel's identity (2.1) therefore implies that an unbiased estimator of /3y ^ is given by (assuming 
that h e ^ for all !eZ) 

fa = E #* [\ t *- 2nUY A where iff* = (2.3) 

It is well-known that the difficulty of the deconvolution problem is quantified by the smoothness 
of the error density h. The so-called ill-posedness of such inverse problems depends on how fast the 
Fourier coefficients h( tend to zero. Depending on the decay of these coefficients, the estimation of fg 
will be more or less accurate. In this paper, we consider the case where the h/s have a polynomial decay 
which is usually referred to as ordinary smooth convolution (see e.g. Fan (1991)): 

Assumption 2.1 The Fourier coefficients ofh have a polynomial decay which means that there exists a real v ^ 
and two positive constants C m in, C m ax such that for all <eZ, C m in|^| _1/ < \h(\ < C ma \\£\~ v . 

The rates of convergence that can be expected from a wavelet estimator depend on such smoothness 
assumptions and are well-studied in the literature, and we refer to Pensky and Vidakovic (1999), Fan 
and Koo (2002) for further details. 

Finally note that to simplify the presentation, we prefer to use the same notation pj and c^ for 
both problems of direct density estimation and density deconvolution. 

2.4 Thresholding of the empirical wavelet coefficients 

Based on an estimation of the scaling and wavelet coefficients, a linear wavelet estimator of / is of the 

form f L = Y%l~o 1 Cjo,k<Pj ,k + EyLy Y%=q fa$j,k- For an appropriate choice of fa one can show that f L 
achieves optimal rates of convergence among the class of linear estimators. Typically, if / belongs to 
a Sobolev space H s with smoothness order s, then for direct density estimation the choice !) x m n' Js + 1 
yields optimal rates of convergence for the quadratic risk, see Donoho et al (1996), Juditsky and Lambert- 
Lacroix (2004) for further details . However, this choice is not adaptive because it depends on the 
unknown smoothness s of /. It is well known that adaptivity can be obtained by using nonlinear 
estimators based on appropriate thresholding of the estimated wavelet coefficients. A non-linear 
estimator by hard-thresholding is defined by 

210-1 ji 27-1 

h = E tjorfk* + E E fa\h^^ {2A) 

fc=o Ho fc=o " " 

where the Tjyt's are appropriate thresholds (positive numbers). Various choices for jj and the threshold 
Tjjc have been proposed. In the case of direct density estimation, Donoho et al (1996) recommended to 
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take level-dependent threshold xjyj. ~ y/j/n and D x ~ log " ^ . For density deconvolution, one possible 



calibration in ordinary smooth deconvolution is D x ~ n 2 "+i and t^ j. ~ -^i, see Pensky and Vidakovic 

(1999). The choices ~ 2 1 '-' \J j/ n and ~ 2 v i\J 2^Si!li have also been considered in Fan and Koo 
(2002) and Bigot and Van Bellegem (2009) respectively 

However, such thresholds may be too large in practice as they do not take into account the fact 
that the variance of each empirical wavelet coefficient py^ depends on its location (j, k) . Consider the 
problem of density deconvolution and let us denote the variance of p; £ by 

^ = E (^/c-M 2 - 

Let us also denote by tpj ^ the function defined for y 6 ]R by 

hkiv) = E $ k e- 2nUv - 

By definition, it follows that pyjt = \ Y? m =\ $j,k(Ym) and thus aj k = iVar(^ ; y t (Y 1 )). Hence, a simple 

upper bound for aj k is = ^-E(ipy ^(^l) 2 )- Then, simple algebra shows that in the case of density 
deconvolution 

1 



V, 



~ ~y\ I to \^i,k(y)\ 2 f Y (y) d y = ~ E ^ 



(2.5) 



with f Y (y) = Jq f(u)h(y — u)du and fj_ e , = Ee e ') Y i. An unbiased estimator of Vj k is thus given 

by 



V, 



m=l 



m=l 



E £ 



.Tj,k-2niCY„, 



(2.6) 



Similar computations can be made for the case of direct density estimation with ip*j instead iftf c , fi-p 
instead of fj_£i, and X,„ instead of Y m in the above equations (2.5) and (2.6). Note that V;^ can also 
i - 2 

be written as = -^2 X^m=i ^PjJcO^m) , but its calculation is obtained from the Fourier coefficients 
(i/j^) £ e Cj an d n °t from ipj k whose computation at non dyadic points requires numerical approximation. 
Alternatively, one could also use an estimation of the variance c 2 j. given by 

1 



V, 



instead of the upper bound Vj^. However, this does not change significantly our results since crj k and 
Vjfc are very close for n sufficiently large. Moreover, oracle inequalities are simpler to derive using an 
estimated upper bound for crj k . 

A thresholding rule is usually chosen by controlling the probability of deviation of j6nfc from the 
true wavelet coefficient /5y j.. From Lemma 5.3 (see the Appendix) one has that for any positive x, 

P ~ M ^ JWj& + m x ) ^ 2exp(-x) where 



'/./ 



E wh 



(2.7) 
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which would suggest to take a threshold of the form 



where 5 > is a tuning parameter. Thinking of the classical universal threshold, one would take 5 = 1. 
However, this choice can be too conservative and the results of this paper shows that it is possible to take 
5 smaller that 1. Moreover, it is shown that the choice of this tuning parameter depends on the highest 
resolution level ]\ and the degree of ill-posedness v in the case of density deconvolution. Throughout 
the paper, we discuss the choice of 5 and finally propose data-based values for its calibration. 

Obviously r* k is an ideal threshold as Vj ^ is unknown. Based on Lemma 5.4 (see the Appendix) 
which gives a control on the probability of deviation of Vj ^ from Vj ^, we propose to use the following 
random thresholds 



r j,k 




where K = § + y § 

;' k ~ i k 

Again, for direct density estimation, we take the same thresholds with M instead of ip { ' to compute 
rjj in (2.7). The above choice for Ty^ resembles to the universal threshold log(ji) proposed by 
Donoho and Johnstone (1994) in the context of nonparametric regression with homoscedastic variance 
(j 2 . Here we exploit the fact that in the context of density estimation the variance of a wavelet coefficient 
depends on its location (j,k) and has to be estimated. 

The additive terms ^°^ n f\] and y 2<51og(n)V ;/ )t^ + 5 log (n)K-^ will allow us to derive oracle 

inequalities. Indeed, in Section 3, we compare the quadratic risk of f n to the risk of the following oracle 
estimator 

2^0-1 h 73-1 

h = E e h#Pjo*+L E hAw>^i- k (2 - 9) 

Note that /„ is an ideal estimator that can not be computed in practice as its depends on the unknown 
coefficients /3^/ c of / and the unknown variance terms c?. However, we shall use it as a benchmark to 
assess the quality of our estimator. The quadratic risk of /„ is 

2^0-1 ji 2>-t +oo 2/-1 

E||/n -/ii 2 = E °f ,k+ E E m HP>lk>rf,k) + E E ( 2 - 10 ) 

k=0 j=j k=0 /=/l+l k=0 

where tr?^ = E(cy i- — c^ ^) 2 - Equation (2.10) shows that we retrieve the classical formula for the 
quadratic risk of an oracle estimator given in Donoho and Johnstone (1994) except that the variance 
term aj k is not constant as in standard nonparametric regression with homoscedastic variance. 

Data-driven thresholds based on an estimation of the variance of the empirical wavelet coefficients 
have already been proposed in Juditsky and Lambert-Lacroix (2004) in the context of density estimation. 
However, the estimation procedure in Juditsky and Lambert-Lacroix (2004) is different from ours since 
it is based on biorthogonal bases and on the use of the Haar basis to compute the wavelet coefficients. 
Note that our choice for T; j ( is similar to the random threshold in Juditsky and Lambert-Lacroix (2004), 

but the addition of the terms \ 25 log(n) Vy + & l°g(«)K^- and Vj wm allow us to compare the 
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performances of f„ with those of the oracle f n . The addition of similar deterministic and stochastic terms 
is also proposed in Reynaud-Bouret and Rivoirard (2008) to derive oracle inequalities in the context of 
Poisson intensity estimation. 

3 Oracle inequalities 

To derive oracle inequalities, we need a further smoothness assumption on the error density h: 

Assumption 3.1 There exists a constant C > and a real p > 1 such that the density h satisfies h(x) < j^^p 
for all x£E. 

Obviously the above condition for h is not very restrictive as h is by definition an integrable function on 
]R. For abounded function/ 6 L 2 ([0,1]) we denote by ||/||oo = sup xg j 01 j{|/(x)|} its supremum norm. 
Let us also define the following class of densities 

A 

D 2 ([0,ll) = {/ 6 L 2 ([0,11), with / > Oand / f{x)dx = l\. 

Jo 

3.1 The case of direct density estimation 

The following theorem states that for appropriate choices of ji,jo and the tuning parameter 6, then the 
estimator f n behaves essentially as the oracle f n up to logarithmic terms. 

Theorem 3.1 Assume that f 6 D 2 ([0, 1]) with j|/||oo < +oo . Let a > and 1/2 > n > be some fixed 
constants. For any n > exp(l), define ]\ = j\{n) to be the integer such that 2^ > n'/(logn) a > 2-' 1-1 , and 
jo = joi n ) to be the integer such that 21° > log(n) > 2- ,0 ~ 1 and suppose that n and a are such that ]\ > ]q. 
Assume that 6 > n, and take the random thresholds Ty^ given by equation (2.8). Then, the estimator f n satisfies 
the following oracle inequality 

E||/„-/|| 2 <Ci(*) 

where 

T,a ^EE ft + max, ll/IU, 1) m ax«l„g „)M) <^ + SS^. 
n Ho fc=0 n n y n 

and C\(8) and C2{S) are two positive constants not depending on n and f , and such that lim^^ C\(S) = 
lim^^^ C 2 (S) = +oo. 

The above inequality (3.1) shows that the performances of f n mimic those of the oracle f n in term 
of quadratic risk, see equation (2.10), up to a logarithmic term. The additive term T n \ depends on 
two hyperparameters a and t] which are used to control the effect of the choice of the highest resolution 
level ji and the tuning parameter 5 on the performances of the estimator. Moreover, the above inequality 
tends to show that the performances of f n deteriorates as 5 tends to rj. Thinking of the classical universal 
threshold, one would like to take 6 = 1. However, if one sets n = 1/2 and a. = 0, the additive term T n \ 

is bounded by which is rate typically faster that the decay of the oracle risk (2.10) when / belongs 

to a Sobolev or a Besov space. Hence, Theorem 3.1 shows that if one chooses )\ = \j] log 2 (n)J + 1 with 
n < 1/2, then it is possible to take S smaller than 1. Choosing carefully such hyperparameters is of 
fundamental importance, and a detailed study is therefore proposed in Section 5 to validate the results 



2)0-1 



h 2i-l 



+oo 21-1 



E %,k+ E E min^logW^H E Lfa 



k=0 



Ho k=0 



/=/l+l k=0 



+ C 2 (6)T nA , (3.1) 
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of Theorem 3.1, and to analyze the risk of /„ as a function of the resolution level j\ and the tuning 
constant 8. 

Classically, the level ]\ is chosen as the integer ]\ such that 2^ > 1 " ^ > 2^ 1 ~ 1 (see e.g. Donoho 

ef aZ (1996)). The results of this paper shows that one can use smaller level 2-' 1 of the order rfl (logn)*. 
For n < 1/2, the price to pay is a slightly lower rate for the additive term T Hi i in the oracle inequality 

(3.1) which is of the order instead of the rate j as classically obtained for the additive term when 

deriving oracle inequalities. Note that a similar result in the context of Poisson intensity estimation is 
also given in Reynaud-Bouret and Rivoirard (2008). In particular Reynaud-Bouret and Rivoirard (2008) 
obtain an additive term of the order jj but to derive such a result their proof relies heavily on the fact 
that the wavelet basis they use (the Haar basis) is such that inf x6 [ -n \tp(x) \ > which is not the case for 
Meyer wavelets. 

3.2 The case of density deconvolution 

Consider now the problem of density deconvolution under Assumption 2.1 of ordinary smooth 
deconvolution. 

Theorem 3.2 Assume that f 6 D 2 ([0, 1]) with ||/||oo < +°°, and that h satisfies Assumption 2.1 and 
Assumption 3.1. Let a > and 1/2 > n > be some fixed constants. For any n > exp(l), define 
h = H( n ) t° be the integer such that 2^ > rfl '/( v +i) (log n) a > 2^ 1_1 , and ]q = jo(n) to be the integer 
such that 21° > log(n) > 2- ,0 ~ 1 , and suppose that rj and a are such that }\ > ]q. Assume that 5 > tj (1 + ^pj), 
and take the random thresholds X;^ given by equation (2.8). Then, the estimator f n satisfies the following oracle 
inequality 



E\\fn-f\\ 2 <C 3 (8) 

where 



2'0-l ji ZJ-i +co 2i-l 

E <i*+L E mrn(^log(n)^ fc ) + E E^ 

. k=0 j=j k=0 j=h+ 1 k =0 



C 4 (S)T n>2 , (3.2) 



loz(n) A 2 !^ 1 o (loen) a ( 2l/+1 ) (loz n) 2+2a+2av 
r^^E E^ + max (ll/ll-l)max((lognr4)^^ + 

and Cs(S) and Ci{5) are two positive constants not depending on n and f, and such that\m\ s ^^ ^ C$(5) = 

lim ^^(l + T ^ T ) C 4(^) = + 00 - 

Hence, the above theorem shows that in the case of density deconvolution then the estimator also 
behaves as an oracle estimate up to logarithmic terms, and that the performances of the estimator tend 
to deteriorate as 5 tends to n (l + -^py). Similar comments to those given for Theorem 3.1 can be made. 

n „ n \2+2n+lav 

If one chooses n = 1/2, then the additive term T„ 2 is of the order „ / and 6 has to be greater 

than (1/2 + • Hence, this again shows that one can take a value for 5 smaller than 1. However the 
choice of 8 is typically larger for deconvolution than in the direct case, as it is controlled by the degree v 
of ill-posedness. 

In deconvolution problems, the high-frequency cut-off j\ is usually related to the ill-possedness v of 
the inverse problem and 73 1 = O (( i 0j f( n ) ) 1 ^ 2v+1 -^ is a typical choice for various estimators proposed 
in the literature (see e.g. Johnstone et al (2004), Pensky and Sapatinas (2008), Bigot and Van Bellegem 
(2009)). This is a standard fact that a smaller ]\ should be used for ill-posed inverse problems than in 
the direct case. Again we have introduced hyperparameters a and rj to control the effect of the choice of 
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the highest resolution level j\ . Understanding the influence of the choice of these hyperparameters on 
the quality of the estimator is a fundamental issue, and a detailed simulation study is thus proposed in 
Section 5 to validate the results of Theorem 3.2. 



4 Asymptotic properties and near-minimax optimality 

It is well known that Besov spaces for periodic functions in L 2 ([0, 1]) can be characterized in terms of 
wavelet coefficients (see e.g. Johnstone et al (2004)). Let s > denote the usual smoothness parameter, 
then for the Meyer wavelet basis and for a Besov ball B S ^^{A) of radius A > with 1 < p, q < oo, one 
has that for s + 1/2 - 1/p > 

{/2>0-l \f I +oo /2<-l \ M ' 

/ e L 2 ([o,i]) : y \cj 0)k \ p J +\E 2i{s+1/2 ^ /p) 'yL\hk\ p J <A> 

with the respective above sums replaced by maximum if p = oo or q = oo. 

The condition that s + 1/2 — 1/p > is imposed to ensure that B s p q(A) is a subspace of L 2 ([0, 1]), 
and we shall restrict ourselves to this case in this paper. Besov spaces allow for more local variability 
in local smoothness than is typical for functions in the usual Holder or Sobolev spaces. For instance, 
a real function / on [0, 1] that is piecewise continuous, but for which each piece is locally in C s , can be 
an element of Btp{A) with 1 < p < 2, despite the possibility of discontinuities at the transition from 
one piece to the next. Note that if s > 1 is not an integer, then £>| 2 (A) is equivalent to a Sobolev ball 
of order s, and that the space Bi^A) with 1 < p < 2 contains piecewise smooth functions with local 
irregularities such as discontinuities. Finally let us introduce the following space of densities 

D; > ,(A)=D 2 ([0,l])nB;,(A). 

The following theorems show that for either the problem of direct density estimation or density 
deconvolution, the estimator /„ is asymptotically near-optimal (in the minimax sense) up to a 
logarithmic factor over a wide range of Besov balls. 

To simplify the presentation, all the asymptotic properties of f n are given in the case rj — 1/2 and 
a = where rj, a. are the hyperparameters introduced in Theorem 3.1 and Theorem 3.2. 



4.1 The case of direct density estimation 

For s > and 1 < p < oo and let us define 

p' = min(2,p) and s* = s + 1/2 - 1/p'. 

Then the following result holds. 



Theorem 4.1 Assume that the conditions of Theorem 3.1 hold with rj — 1/2 and a = 0. Assume that 
y v , q {A) withs > \ + jr 



f G DiJA) with s>i + ^r, 1 < p <2andl < q <2. Then, as n —> +oo 



sup E\\f n -f\\ 2 <0 



log(n) 



2s 
2s+l 
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Minimax rates of convergence for density estimation over Besov spaces has been studied in detail 
by Donoho et al (1996). Hence, Theorem 4.1 shows that f n is an adaptive estimator which converges 
to / with the minimax rate up to logarithmic factor for the problem of direct density estimation over 

q(A). The extra logarithmic factor is usually called the price to pay for adaptivity to the unknown 
smoothness s. 



4.2 The case of density deconvolution 

Consider now the problem of density deconvolution under the Assumption 2.1 of ordinary smooth 
deconvolution. 

Theorem 4.2 Assume that the conditions of Theorem 3.2 hold n = 1/2 and a = 0. Assume that f G D s VCj {A) 
with s > j + y,t<p<2 and 1 < q < 2. Ifv(2 — p) < ps* then as n —> +oo 



sup E\\f„-f\\ 2 <0 



logK 



2S+21/+1 



and ifv(2 — p) > ps* then asn-t +oo 



sup e||/„ -/|| 2 < e> 



log(n) 



2s* 



Theorem 4.2 show that there is two different rates of convergence depending on whether v{2 — p) < 
ps* or v (2 — p ) > ps*. These two conditions are respectively referred to as the dense case when the worst 
functions / to estimate are spread uniformly over [0, 1], or the sparse case when the hardest functions 
to estimate have only one non-vanishing wavelet coefficient. This change in the rate of convergence, 
usually referred to as an elbow effect, has been studied in detail in nonparametric deconvolution 
problems in the white noise model by Johnstone et al (2004) and also Pensky and Sapatinas (2008). 
Theorem 4.2 shows that the rate of convergence of /„ corresponds to minimax rates (up to a logarithmic 
term) that have been obtained in related deconvolution problems either for density estimation or 
nonparametric regression in the white noise model (see e.g. Pensky and Vidakovic (1999), Fan and 
Koo (2002), Johnstone et al (2004), Pensky and Sapatinas (2008), and references therein). 



5 Simulations 

Simulations use the wavelet toolbox Wavelab of Matlab (Buckheit et al, 1995) and the fast algorithm 
for Meyer wavelet decomposition developed by Kolaczyk (1994). As in the simulation study of Bigot 
and Van Bellegem (2009), four test densities are considered: Uniform distribution: f(x) = 5l[o 4< o 6 ](x), 
Exponential distribution: f(x) = 10e~ 10 (' v ~ _a2 )lL[o.2,+oo[( z )/ Laplace distribution: f(x) = 10e~ 20 l A ' _a5 l, 
and MixtGauss distribution (mixture of two Gaussian variables): X ~ TZiN(pi,cr 2 ) + TI2N (}i2, cr 2 ) with 
774 = 0.4, tci = 0.6, \i\ = 0.4, U2 = 0.6 and o\ = o~2 — 0.05. These four densities, displayed in Figure 
1, exhibit different types of smoothness: the Uniform density is a piecewise constant function with two 
jumps, the Exponential distribution is a piecewise smooth function with a single jump, the Laplace 
density is a continuous function with a cusp at x = 0.5, whereas the MixtGauss density is infinitely 
differentiable. 
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(a) (b) (c) (d) 

Figure 1: Test densities: (a) Uniform, (b) Exponential, (c) Laplace, (d) MixtGauss. 

5.1 Direct density estimation 

Assume that an i.i.d sample of variables X\, . . . , X„ is drawn from one the test densities shown in 
Figure 1. The empirical Fourier coefficients = \ Hm=i ex P( — 2Tti£X m ) are computed for i = 
— N/2 + 1, N/2, where N = 2' is a dyadic integer with / chosen such that N > n. The //s 
are then used as an input of the efficient algorithm of Kolaczyk (1994) in order to compute empirical 
scaling and wavelet coefficients (Cj ^) jt=0 2 ;' -i/ (Pj,k)k=o 27-i;=/ j-V This algorithm only requires 
O (N (\og(N)) 2 ) operations to compute the empirical wavelet coefficients from a sample of Fourier 
coefficients of size N. According to Theorem 3.1, one can take }q = [log 2 (log(n))J + 1. Then, two 
hyperparameters essentially controls the quality of the estimator: the high-frequency cut-off parameter 
jl and the constant 8 used in the definition of the thresholds Ty^ given by equation (2.8). If the level j\ 
is such that 7) 1 > n''(logn) a > 2-' 1 " 1 for some rj > 0, a. > 0, then one must take 8 > rj. Increasing a 
deteriorates the rate of convergence of the additive term in the oracle inequality (3.1), so one can choose 
to set a = 0. Following the choice made for the asymptotic study of f n in Section 4, one can take rj = 1 /2 
and according to Theorem 3.1 one should then take 8 > 1/2. However, it is not clear if a value for 8 
lower than 1/2 would deteriorate or improve the quality of the estimator f n . 

Alternatively, assume that ]\ is given. Then, another possibility for choosing 5 is to take rj* = 
(jl — 1)/ l°g2( n ) which is the smallest constant rj that satisfies 2^ > rfl > 7) x , and then take 5 > rj*. 
Combining the above arguments, we finally suggest to take 

71 = fi = [\ log 2 («)J + 1 and S* = ( 7l - 1)/ log 2 (n). (5.1) 

To give an idea of the quality of f n , a typical example of estimation with n = 200, j* = 4 and 
S* « 0.3925 is given in Figure 2. Another possibility for choosing a smaller value for 8 is to take n ^ 0, 
and then to choose 5* = rj* = {j\ — 1 — a log 2 (log(«)))/ log 2 (n) since rj* is the smallest constant rj that 
satisfies 2h > «</ log(n) a > 2^~ 1 . For n = 200, a = 0.5 and ;'i = 4 this yields to the choice 8 ps 0.2351. 
However, as already remarked, the oracle inequality (3.1) shows that taking « ^ may deteriorate the 
risk of /„ . 

The goal of this numerical section is thus to validate the above choices (5.1) for )\ and 8, by studying 
the risk of f n (compared to the risk of the oracle /„) as a function of these two hyperparameters. More 
precisely, given ]\ and 8, we define 

Rn{l1,o) = : ; : r - ; : (5-2) 

To illustrate the usefulness of taking random thresholds X;^ depending of the location (j, k), we compare 
Rn (ji, 8) with the risk R n (ji,8) of the wavelet estimator obtained by taking a level-dependent threshold 
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(a) 



(b) 



(c) 



Figure 2: Typical reconstructions from a single simulation with n — 200, j\ — j\ —4 and 5 = 
C/i — 1)/ log 2 (") ~ 0.3925: (a) Uniform, (b) Exponential, (c) Laplace, (d) MixtGauss. 



Then, for each test density, M = 100 independent samples of size n = 100,200 are drawn. Empirical 
average of R n 5) and R n (ji, 8) over these M repetitions are plotted in Figure 4 (n = 100) and Figure 5 
(n = 200) with 8 G [0, 5] and j\ < ;'i < ;'| + 2. For n = 200 and N = 256, we also display in Figure 3 the 
true wavelet coefficients j6,- ^ and the standard deviation Cy ^ to show the ideal thresholding performed 
by the oracle estimator f„ . For the Uniform, Exponential and Laplace distributions and ]\ = 4, it can be 
seen that \f>j r k\ > C/,jfc f° r all jo — j — jl an d < fc < 2^ — 1 , which means that the oracle estimator f n 
performs no thresholding and keeps all empirical wavelet coefficients /3; ^ for j < j\ . For the MixtGauss 
distribution and j\ = 4 the behavior of the oracle estimator is different as for some < k < 2 ]1 — 1, one 
can observe that 1/3^1 < Cj,k- One retrieves this behavior in the first column of Figure 5 (case j\ = 4) 
for the curves R n (j\,S) and R n (j\,8) which both have a minimum at 5 = (no thresholding is done) 
for the Uniform, Exponential and Laplace distributions. For the MixtGauss distribution, R n (ji,S) has a 
minimum at 5 w 0.4 while R„ (j\, S) has a minimum at S ~ 0.8. For 5 < j\ < 7 and the Uniform, Laplace 
and MixtGauss distributions, R n (ji,S) has a minimum at some i5 G [0.4,1] and the value of R n (ji,S) 
at this point is smaller that the minimum of R n (ji,S). This indicates that taking thresholds depending 
on the location (j, k) can improve the quality of the estimation. For the Exponential distribution, the 
estimator with a level-dependent threshold of the form 5 ^Jj/n performs generally better than /„ . Similar 
comments can be made for the behavior of the curves displayed in Figure 4 (case n = 100). 

Moreover, it can been that the smallest value of the risk of /„ relative to the risk of the oracle /„ are 
obtained for j\ — j\ = \_\ log 2 («)J + 1. This indicates that introducing wavelet coefficients at resolution 
level larger than jf generally deteriorates the quality of the estimation. Finally, note that the curves in 
Figure 4 and Figure 5 tends to confirm that the choice (5.1) for j\ and 8 is reasonable, and leads to very 
satisfactory estimators. 

5.2 Density deconvolution 

In the case of density deconvolution, we propose to compare the performances of our wavelet approach 
with those of the adaptive density deconvolution estimator of Comte, Rozenholc and Taupin (2006a), 
Comte et al (2006b) that is based on penalized contrast minimization over a collection of models 
containing square integrable functions with Fourier transform having a compact support. Such an 
estimator is therefore a band-limited function, see Comte et al (2006b) for further details. Moreover, 
this method can be viewed as a kind of adaptive linear wavelet estimator with a Shannon wavelet 



of the form 6 




:, as originally proposed by Donoho et al (1996), where 



Rn(jl,8) 
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(a) (b) (c) (d) 

Figure 3: True wavelet coefficients \(ijjc\ (solid curve), and standard deviation (dashed curves) as a 
function of 2' + k for / = ; , . . ., / - \,k = 0, . . . , 2> - 1 (with j = 3, J = 8), for each test density: (a) 
Uniform, (b) Exponential, (c) Laplace, (d) MixtGauss. 

basis which is also a band-limited function like the Meyer wavelet but less localized in the time domain. 
Comte et al (2006b) have shown that the model selection procedure performs very well for finite samples, 
compared with other standard estimators. In particular, this estimator outperforms the kernel estimator, 
even when the bandwidth parameter is selected in a data-driven way which makes this procedure as 
the most challenging competitor in our simulations. 

Observations Y,, i = 1, . . . , n are generated from the additive model Y, = X; + e\, where X, are 
independent realizations from one of the test functions / displayed in Figure 1, and the e,'s are 
i.d.d. additive errors with density h. Results are presented with a Laplace measurement error, that 
is h(x) = {-JlcTe) -1 exp( — \/2\x \ / cr £ ) , for x £ R and where cr £ is the standard deviation of the errors. 
The Fourier coefficients of h are given by kg — (1 + 2c 2 7r 2 £ 2 )~ 1 , £ 6 Z which thus corresponds to the 
case of ordinary smooth deconvolution with v = 2. The main quantities in the simulations are the 
sample size n and the root signal-to-noise ratio defined by s2n = Cx/cr e with Ux = -\/var(Xi). 

According to Theorem 3.2, one can take jo = Llog 2 (log(n))J + 1 and important quantities to control 
the quality of estimation by wavelet thresholding are the highest resolution level ]\ and the tuning 
parameter 5. If j\ is such that 2h > «'?^ t/+1 )(logn) a > 2^ 1_1 for some t] > 0, a > 0, then Theorem 
3.2 suggests to take 8 — rj (l + ^tj)- As already remarked, for ill-posed inverse problems, a smaller j\ 
than in the direct case should be used. However, for n = 200, following the asymptotic considerations 
in Section 4, the choices r\ — \/2, a. — Q and v = 2 yield to j\ = log 2 («)J +1 = 2 which is smaller 
than jo = |log 2 (log(ji))J +1 = 3. Hence, setting in advance values for rj and a may yield a theoretical 
choice for j\ that cannot be used in practice. Note that this issue has been noticed in several papers on 
deconvolution by wavelet thresholding, see Johnstone et al (2004), Bigot and Van Bellegem (2009). 

Alternatively, let us argue as in the direct case, by assuming that j\ is given. If one sets oc = 0, 
then for choosing 5 one can take r/* = (v + — 1)/ log 2 (n) which is the smallest constant t] that 
satisfies lh > n^O+l) > 2/1- 1 , and then take <5 = (1 + ^) rj* = (2v + 1) (ji - l)/log 2 (n). A 
smaller value for S can also be made, by choosing a ^ and by taking 5 = (1 + -^py) Tj* with 
tj* = (v + l)(;i-l-alog 2 (log(n)))/log 2 (n). 

We report results for n = 100, 200 and s2n = 3, which a relatively large signal-to-noise ratio. To give 
an idea of the quality of f n and to compare it with the model selection estimator, a typical example of 
estimation with n = 200, jj = ]q = 3, a = 0.5 and S = (2v + 1) (ji — 1 — alog 2 (log(rc)))/ log 2 (rc) « 
0.5215 is given in Figure 6. Both methods perform similarly for the estimation of the smooth density 
MixtGauss. For the three non-smooth densities, wavelet thresholding performs much better than the 
model selection estimator. This comparison on a single simulation tends to confirm the superiority of 
wavelet-based methods over those based on Fourier decompositions for the reconstruction of signals 
with local singularities. 

However, it is not clear how to choose j\ in practice. Indeed, an optimal theoretical level can be 
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(a) Uniform - /j = 4 (b) Uniform - ji =5 (c) Uniform - ji =6 




(d) Exponential - jj =4 (e) Exponential - \\ = 5 (f) Exponential - ji =6 




(g) Laplace - ji = 4 (h) Laplace - ;'i =5 (i) Laplace - ;'i = 6 




(j) MixtGauss =4 (k) MixtGauss - /i =5 (1) MixtGauss -j\ = 6 



Figure 4: Direct density estimation with n = 100. Evolution of R n {j\,5) (solid curves) and R n (ji,S) 
(dashed curves) as a function of 5 E [0,5] for different values of j\ > j\ = [ \ log 2 ( n ) J +1 =4 

too small, but taking a too high level of resolution may introduce some instability in the estimation. 
Moreover, it is of interested to study if 5 can be smaller that t] (l + -^j)- A goal of the following 
simulation study is thus to identify a reasonable empirical range of values for ]\ and 5. 

For a given ]\ and 5, the risk R n (]\,S) of the wavelet-based estimator, see (5.2), will be compared 
to the risk of the model section procedure divided by the risk of the oracle f n - For each test density, 
M = 100 independent samples of size n = 100, 200 are drawn for sin = 3. Empirical average of R n 
and of the risk of the model selection estimator (divided by the oracle risk) over these M repetitions are 
displayed in Figure 7 (n — 100) and Figure 8 (n = 200) for 5 6 [0, 5] and ]q < ]\ < ]q + 2. For 
jl = jo, wavelet thresholding clearly outperforms the model selection estimator for all values of S and 
all densities, expect for the Uniform distribution with n = 200 for which it can be seen that model 
selection is slightly better if 5 is larger than 0.2. These simulations, also show that the choice ]\ = jo = 3 
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(a) Uniform - /j = 4 (b) Uniform - ji =5 (c) Uniform - ji =6 




(d) Exponential - jj =4 (e) Exponential - \\ = 5 (f) Exponential - ji =6 




(g) Laplace - ji = 4 (h) Laplace - ;'i =5 (i) Laplace - )\ = 6 




(j) MixtGauss =4 (k) MixtGauss - ji =5 (1) MixtGauss -j\ = 6 



Figure 5: Direct density estimation with n = 200. Evolution of R n {j\,5) (solid curves) and R„(ji,S) 
(dashed curves) as a function of 5 E [0, 5] for different values of j\ > j\ = \_ \ log 2 ( n ) J +1 =4. 

yields the best results. This observation is consistent with the condition of Theorem 3.2 that suggests 
a smaller ]\ for ill-posed inverse problems than in the direct case. It also confirms that introducing a 
higher level of resolution clearly deteriorates the quality of the estimator when compared to the oracle 
risk. 

Combining the above remarks, we finally suggest the following choice 
k = h = U°g 2 (log(n))J +land<5= (2v + - 1 - «log 2 (log(n)))/ log 2 (n) with oc = 0.5. 

For n = 100, this yields to ]\ — 3, 5 w 0.6761 and for n = 200 to j\ = 3, <5 ~ 0.5215. The curves in Figure 
7 and Figure 8 indicate that such a choice is reasonable, and leads to satisfactory estimators. 
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Figure 6: Typical reconstructions from a single simulation with n = 200 for the four test densities 
Uniform (1st column), Exponential (2nd column), Laplace (3rd column), MixtGauss (4th column) 
contaminated with Laplace noise, by wavelet thresholding (1st row) with ]\ — jg = 3, 5 « 0.5215, 
and model selection (2nd row). 

Appendix 

In what follows C will denote a generic constant whose value may change from line to line. Proofs are 
given for the case where h satisfies Assumption 2.1 and Assumption 3.1. The proofs for the case of direct 
density estimation follow from the same arguments. 

5.3 Technical lemmas 

We start this technical section by a set of lemmas that will be used in the proof of Theorem 3.2. For all 
the results presented below, it is supposed that / 6 D 2 ([0,1]) with ||/||oo < +°°- 

To prove these results we will use the following properties which come directly from the fact that 
Meyer wavelets are band-limited and that under Assumption 2.1 \h(\ ~ \£\~ v - 



Lemma 5.1 For all integer j, of^ = E(j§y /jt - fa) 2 < V j/k < C||/||co^ and oj^ = E(c /0/fc - c j0)k ) 2 < 



\ipj,k\ < 2" ;72 and #C ; - < 4tt2^' 
\hi\~ z < C2 2 ^foralH £ Cy, 

where #C denotes the cardinality of the set Cy. 



(5.3) 
(5.4) 




Proof: recall that Vjjc is an upper bound for crj k , and that from equation (2.5), one has that 
n Ik I $j,k(y) \ 2 f Y (y)dy- Then, remark that since h satisfies Assumption 3.1, it holds that 



1 ftn+l I rm+1 rl 

V iM = ~L \hM\ 2 f Y {y)dy=- E/ \% k (y)\ 2 f(u)h(u-y)dudy 
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(a) Uniform - ]\ — 3 (b) Uniform - ji =4 (c) Uniform - ]\ —5 




(d) Exponential - ]\ =3 



(e) Exponential - ]\ = 4 



(f) Exponential -ji =5 



(g) Laplace -ji=3 




(h) Laplace - j\ = 4 



(i) Laplace -]\ =5 




(j) MixtGauss - ji — 3 



(k) MixtGauss - ]\ =4 



(1) MixtGauss - )\ =5 



Figure 7: Density deconvolution with s2n = 3 and n = 100. Evolution of R n {j\,S) as a function of 
S E [0,5] for different values of ]\ > jo = 3 (solid curves). The dotted lines represent the risk of the 
model selection estimator divided by the risk of the oracle. 

where 7 > 1 is the real defined in Assumption 3.1. Now, since tfj^iy) is a periodic function on 1R with 
period 1 it follows that 

1 I 1 



Vj, k < qi/ll 



n Jo 



\$j,k(y)\ 2d y E 



mez 



Finally using Parseval relation and the bounds (5.3) and (5.4), it follows that J \tf>j,k(y)\ 2 dy 



Mi 2 



YLleCj Wt \ 2 = TileCj ~\hj r — ( -2 2,v which finally implies (using the fact that 7 > 1, ) 

2 2;V 

^<C||/||.— 



which completes the proof. The argument is the same to bound cr? fc . 



□ 
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(a) Uniform - ]\ =3 (b) Uniform - ji =4 (c) Uniform - ]\ —5 



(d) Exponential -j\ =3 



(e) Exponential - ]\ = 4 



(f) Exponential -ji =5 




(g) Laplace - j\ = 3 



(j) MixtGauss - /i =3 



(h) Laplace - ;\ =4 



(i) Laplace - j\ =5 




(k) MixtGauss - ji =4 



(1) MixtGauss - ji = 5 



Figure 8: Density deconvolution with sin = 3 and n = 200. Evolution of R n (ji, 5) as a function of 
S E [0,5] for different values of ]\ > jo = 3 (solid curves). The dotted lines represent the risk of the 
model selection estimator divided by the risk of the oracle. 



Lemma 5.2 For all p > 2, E(fa - fi^f? < Cmax(||/||£, , 1) + ~~2^r 



Proof: by definition fa - fa = \ E"„=i Z m where Z m = EfeC, ff ( e 



TJ' k f„-27ti£Y m 



fyhi ) . Remark that for 



all m, EZ m = 0, and that from Lemma 5.1 Var(Z,„) < C||/|| o2 2 ^ v . Since / and h are densities, \ f(h(\ < 1. 

i> ,k \ 

Hence using (5.3) and (5.4), this implies that \Z m \ < 2^ eC . ^ < C2! ( - v+1/2 \ Then the result follows 
from Rosenthal's inequality (see Rosenthal (1972)). □ 



Lemma 5 



.3 For any positive x, P (j/3y,/ c — jSy^l > ^JlV^x + |^x^) < 2 exp(— x), where t]j = E^eCy 1^1- 



Proof: note that fa - fa = \ E* =1 (W m - EW m ) where W„, = £teq e~ 2nUYm . Then, remark that 
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by definition Vj k = ^ E"i=i E|W m | 2 , and that \W m \ < YjteCj l^ll- Hence, the Z„,'s are bounded random 
variables, and thus the result follows from Bernstein's inequality (see e.g. Proposition 2.9 in Massart 
(2006)). □ 

Lemma 5.4 For any positive x, 



P [ Vjk > V jk + J 2^V jk x + k^x ] < exp (-x) . 



where ; 



Proof: the proof is inspired by the proof of Lemma 1 in Reynaud-Bouret and Rivoirard (2008). By 
definition V jk = 4j £^ =1 W m with W m = Ei,e> £Cj ^l k ^e~ lni{t ~ e)Ym - Then, remark that |W„,| < 
EweCy Wl \ Wp\ which implies that | W m \ < t]j for all m = 1, ...,n. Moreover, one can remark that 
E|W m | 2 = E\xp j/k (Y m )\ A = J R \xp j/k (y)\ A f Y (y)dy. Then, since \$ jik {y)\ 2 < (jUeq |<?f|) 2 , it follows that 



E|W m | 2 < tjj | r \^, k {y)\ 1 f Y {y)dy < njnV jk . 



Hence, by applying Bernstein's inequality (see e.g. Proposition 2.9 in Massart (2006)), one obtains that 
for any positive x 



P ( V jk > V jk + f-^V jk x + ^Lx) < exp (-x) . (5.5) 
Now, let u = -jfX and let g(y) = y 2 — \[2~uy — ^ for y > 0. From (5.5), one has that 



P(*(0ft) > V jk ) <exp(-x) 



As Vjk and Vj k are positive, one can check that it is possible to invert the inequality g{^JVj k ) > Vj k to 
obtain that 

P (Jvjk > g'HVjk)) <exp(-x), 



and that g 1 (y) = J y + ^ + J | . Hence, one obtains that for any positive x 



±^u + \/2uVj k + ^ 1 



Now, using the fact that J 2uV k + ^ < \ 2uVj k + J ^f, it follows that 



4 /. ~ 5u 2 



Vj k + 3 " + y 2uV jk + — < Vjk + \]2uVj k + ku, 
which completes the proof. □ 
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5.4 Proof of Theorem 3.2 

Let us define the following set of integers A„ = {(j,k);j < j < ji,0 < k < 2 - 1}. We first establish 
the following proposition: 

Proposition 5.1 Let e > 0. Then, for any subset of indices m C A„ 

||/n-/|| 2 < E(^- c m) 2 +(-^) E ^+(2 + e) E (pW/* 

fc=0 \ fc / yjt)eA„\m (;',/c)Gn! 

+ (^) E4 + (^) 2+ f E4 

x 7 Era x 

wJiere m = {(/,fc); |0 yjfc | > T j/k ,j < j < ji,0 < k < 2' - 1}. 

Proof: the proof is inspired by model selection techniques (see Massart (2006) and also Reynaud-Bouret 
and Rivoirard (2008) ). Let m C A„ and define 

fm= E ^/k and /m= E Pfltfjk- 

(j,k)£m (j,k)em 

Note that f tl is given by / m = E{L /0 E^ 1 ^^kl^l^}^- Then ' define 

7m = - E and pen(m) = E T fk' 

(j,k)em (j,k)&n 



and one can check that 



rh = arg min 7,,, + pen(m) 

meA„ 



Now let / = X^t™ E^q 1 Pjktyjk an d remark that for any m C A„ 

+00 2/-1 

7 m = 11/™ - /T - E E P% - 2 E fait* - p»- (5.6) 

;=;'o k=0 (j,k)em 
Then, by definition of rh, equality (5.6) implies that for all m C A„ 

||/m-/T< ||/«-/|| 2 -2 E faCPjk-Pjk)+2 E ^jSjk-^k)+pen(m)-pen(ifi). 

{j,k)£m (j,k)£m 

Using the Pythagorean equality \\f m — f\\ 2 = \\f m — f\\ 2 + ||/ m — f m || 2 one finally obtains that 

||/m-/|| 2 < \\fm-f\\ 2 -\\f m -fm\\ 2 + 2 £ f>0 jk - f> jk ) - 2 £ P>(p> " P>) 

(j,k) 6 m (j,k) e m 

+pen(m) — pen(m). (5.7) 
Let e > and remark that, 



2 E p>(p> - p» - 2 E p>(p> - fy) < 2 (ll/m - /II + II/* -f\\) E (fo - Pjk) 2 (5.8) 

(j,k)£m (j,k)£m y (j,k)£mUm 
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Then, remark that YL(j r k)emUm 

(jSyjt - fij k ) 2 < \\f in - fm\\ 2 + \\fm - fm\\ 2 and thus by using twice the 
inequality lab < da 2 + (l/8)b 2 with 6 = 2/(2 + e) and 9 = 2/e, and by inserting inequality (5.8) in 
(5.7), one obtains that 



II/* -/T < ^||/ m -/|| 2 + e ||/„«-/,,|| 2 +pen(m) 

+ (l + e)||/ )ft -/, ft || 2 -pen(m). (5.9) 
which completes the proof since /„ = T^Iq 1 Cj k<Pj ,k + /'«■ ^ 
Next we prove the following lemma: 

Lemma 5.5 For any 5 > n (l + -^i), there exists e = e(5) > such that 5 > (1 + e) (l + ^4), and a 
positive constant C not depending on n such that 

El £ (1 + e) (p jk - /3 /7c ) 2 - t 2 ) < Cmaxdl/lloo, 1) max(log(n)", 1) ( lo g")J (2V+1) , 

\{j,k)em / 

Proof: let Z = E(y, fc ) gr?I (l + (?) ^ - /S ; -fc) - t 2 c . By definition of m 

A 2 ^ / / » X 2 2 



EZ = Mfc-ftt % ; ,|>^ } 



< E£ 2 £( (l + e)(p jk -fS jk Y)t 



Ho k=0 



{\fa\>^\$ jk -p jk \>^} 



Now by applying Holder inequality, one has that for all p > 2 and ^ > 1 such that i + ^ = 1 



iL2Ll/ \2p\ 1/P / / - Tft 



EZ< (1 + e) £ £ (E(fo-fyfc) ) ( P ( |^i - /3 y7c | > ) ) (5.10) 



Now by Lemma 5.2 one has that 

1/p 



E 



/, \2p\ 1/p /2 2 W 2/P( 2l ' +1 A 



By definition of j\ and jo and since n < 1/2, one has that there exists a constant C such that for all 
jo < j ; < /l/ 4= < Clog(n) a which implies (since p > 2) that ^p-T' < ^^T" log(n) ap . By inserting this 
inequality in (5.11) one obtains that 

2p \i/p 2 2 i u 



E 



(fa-fa) ) < Cmaxdl/lloo, l)max(log(«r,l) — . (5.12) 
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, if a if 



Now let jjk = y 25 log(n)^j Vj k + k-^_5 log(n), and remark that by definition of the threshold Xj k and by 
using Lemmas 5.3 and 5.4 it follows that 



< p |^jfc-^jfc|> 



l lVjkSlogjn) t]j 6 log(n) 
1 + e 3n 1 + e 



< C(n~^ +n^ s ) <Cn~Tf~e (5.13) 

Now inserting (5.12) and (5.13) into inequality (5.10), and using the definition of ]\ and ]q one finally 
obtains that for any q > 1 and e > 

A 2/-1 , 

EZ < Cmax(||/|| 00 ,l)max(log(H)M) £ J] 2 2 ^n ^ 

;=/o fc=o 

< Cmax(||/|| CO/ l)max(log(n) fl: / l)n _1_; K I ^)2A( 2l ' +1 '. 
By definition of 7i, 2/i( 2l,+1 ) < Cn''T+r i\ogn)< 2v+v > = Cn'( 1+ wr) (logn) a ( 2l ' +1 ) which implies that 

EZ < Cmax(||/||oo,l)max(log(«) a / l)n _1+ ' / ( 1+ ^^^(log«) a ( 2t,+1 ) 

By assumption, 8 > n (l + • Hence there exists e > such that 5 > (1 + e)q (1 + ^-), and one can 
then always find some q > 1 such that 5 > q(l + e)?/ (l + . This implies that 

EZ < Cmax(||/||oo, 1) max (log(tt) a ; l)tt _1 (logtt) a(2l/+1 ) 
which completes the proof. □ 

Now, using the inequality (a + b) 2 < 2a 2 + 2£> 2 one has that 

2 



EtJ < 4<nog(n)E% + 2^1og(n 



where Vj k = Vj /k + y 26 log ( n ) Vj /k ^ + J log ( n ) jc ^ . Then using the inequality 2ab < a 2 + b 2 and the fact 
that EVi k = Vj k , it follows that 

EV jk <3V jk + (l + K)Slog{n)^, 
which finally implies that there exists a constant C such that 

EtJ < C (login) V jk + ilogn) lV -L\ . (5.14) 
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Now, using Proposition 5.1, Lemma 5.5 and inequality (5.14), it follows that there exists two constants 
C(S) and C(S)' not depending on n and /, such that for any subset of indices m C An 

2'0-l I +oo 27-1 

EIIWII 2 < L°fo,k + C ( S )[ E P%+ E (l + Iog(n))ty+ E 

fc=0 V Q,k)eA„\m (j,k)em j=h+ 1 k=0 

((i no . M \a(2v+l) « 2 \ 
maxdl/lloo^JmaxOog^M)^^ + (logn) 2 £ ± 5,15) 

and one can easily check that lim^ [ ^ ^ | v ^ C(S) = lim^^/^ *\ C'(<5) = +oo Then, as //y = EfeCj l^/fcl 

„2 

it follows from (5.3) and (5.4) that nj < C2^ 2v+V >, which implies that for any m C A n , E(j,k)em % < 
C^EyL/ Ef=o l2; ' (2t,+1) ^ C 2/l( ^ +2) < Cn 2 ^" 1 ' (log «) a ( 2t '+ 2 ) by definition of ft. Inserting this inequality 
in (5.15) with the model m = | (/,£); |/5^| 2 > log(n)V / ; c | finally yields that there exists two constants 
C(S) and C(S)' such that 

2^0-1 / ji 2i-\ +00 27-1 \ 

E||/«-/|| 2 < E °j ,k + C(t) E E min(/3 2 fc ,log(n)^)+ £ £ ,3 2 fc 
J:=0 V=/o k=0 7=71+1 fc=0 / 

/ aog«) ff ( 2t,+1 ) aog«) 2+a ( 2i,+2 )\ 

+C'(J) fmax(||/|| 00 ,l)max(log(n)M)^^ + 1 § J ■ 

To finally obtain inequality (5.15), remark that Vj^ = o~j k + ^/5 2 j., and one can easily check that 

nun($ t ,log(n)ty) < min(/3 2 c , logger 2 ) + 
which completes the proof of Theorem 3.2. □ 

5.5 Proof of Theorem 4.2 

Given our assumptions, one has that 1 < p < 2 which implies that s* =s + l/2 — 1/p. First we need 
the following lemma: 



Lemma 5.6 Iff € B s pq (A) with 1 < p < 2 ften 



27-1 



E $k < A 2 2- 2 ' s \ (5.16) 
fc=o 

Moreover, ifs > 1/p + 1/2 twi/z 1 < p < 2, f/zen f/zere exzsfs a constant B > smc/z t/iai 

sup H/lloo < B. 
/eB^(A) 

Proof: since / e B^(A) one has that ig!^ 1 l c /'o,fcl P ^ AP and Ejfco l/W ^ AP2-/X s+1/2 - 1/ P). Since 
p < 2 it follows that 

/2/0-1 \ 1/2 /2k-i \ 1/p 

(EM 2 ) <[EM p ) ^ < 5 - 17 ) 
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and 



'27-1 

E \M 2 



1/2 



'27-1 



1/p 



(5.18) 



,/c=0 



which proves the first part of the Lemma. Then, remark that 

270-1 +oo 27-1 

ll/llco < || E C k^kA\™ + E II E fyVt'Mloa 
fc=0 ;=; fc=0 

Now, let x G [0, 1] and remark that by Cauchy-Schwartz inequality 

2 



(5.19) 



and 



270-1 

E c b*<Pio,k( x ) 

fc=0 



27-1 

E Pj,k*Pj,k( x ) 
k=0 



'270-1 



<2 2 nni[ e ka 

\ k=0 



/271-1 > 

<2 2 ^!i 2 J E IM 2 

V k=0 j 



Hence the above inequalities, (5.17), (5.18), and (5.19) imply that 



H/lloo <A2'° \\(f>\\«, + A^lloo £ 2 -^ s - 1/2 - 1/ P) 



;=/o 



By assumption s - 1/2 - 1/p > which implies that 2-K s - 1 / 2 - 1 / P) < +oo , and thus the result 
follows with B = A (27 , o||^||oo + IMUEg, 2-/( s - 1/2 " 1/ P)) . □ 

Note that although not always stated Lemma 5.6 will imply that the various bounds given below 
hold uniformly for / £ D s p cj (A). 

Let R„ = Ri + R 2 + R3 with 

270-1 



R 



h 27-1 +oo 27-1 

l=E<;c R 2 = E E min (^og(n)V jk ) and R 3 = £ £ 

fc=0 ;'=/o /c=0 ;=/i+i *;=o 



Given our assumptions on / and since t] = 1/2, Theorem 3.2 and Lemma 5.6 imply that there exists two 
constants C and C' such that for every n > exp(l) and all / 6 y q(A) 



E||/„-/|| 2 < CR n + C 



/(logn) 2 



(5.20) 



Hence to study the rate of convergence of f„, it suffices to study the asymptotic behavior of R n . From 
Lemma 5.1 and by definition of /q one has that R\ < C 2/o( ^ +1> < C ^ log which implies that 

Rl — O (n 35WF1 ^ (in the dense case) or Rj = ^n^^sj (in the sparse case) . (5.21) 
Then remark that Lemma 5.6 implies that R 3 = O (l~ 2 h s *\ = O (n~^ (logn)" 2as *) by definition of 



jl . Now remark that if p = 2 then s* = s > 1 (by assumption) and thus j^pl > 2 S +2v+l • If 1 < p < 2 then 
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s* = s + l/2 — 1/p, and one can check that the condition s > 1/2 + 1/p implies that > 2s+2v+\ ^ 
v(2 - p) < ps*, and that > 2s 2 +2v ^ ~ V) - V s *- Hence one obtains that 

R 3 = O (tTIS^tA if v {2- p) < ps*, (5.22) 

R 3 = O (n~^^\ if i/(2- p) > ps*. (5.23) 

(5.24) 

Now it remains to study the term R 2 . 

For this let us consider first the dense case when v(2 — p) < ps*, and decompose R 2 = R 2 i + R22 
with 

R 2i = E 2 Emin(/52 /c/ log(n)y //c ) and R 22 = £ E mm(/3 2 fc/ log(n)V, fc ), 

Ho fc=0 H2+I k=0 

where j 2 = ji{n) is the integer such that 2) 1 > (n/ log («)) 2S+2U+1 > 2> 2 ^ 1 (note that given 
our assumptions j 2 < ]\ for all sufficiently large n). Then remark that by Lemma 5.1, R 2 \ < 
D| 2 =;o Eto llo g(")^ < CEf =;o log(n)^ < Clog(n) 2 ^ < C(n/log(n))-=&n. Hence 



R 21 = O [{n/ log(n))-s+iF+T J (5.25) 

Now remark that if p > 2 then by equation (5.16) and by definition of j\ and j 2 it follows that 

R22 < C EyL/ 2 +i 2_2/s < c (w" 3 ^ 5 - (n/log(n)) _ 2s+a/+iJ which implies that 



R22 = O Un/ log(«))"s+27+r j . (5.26) 
Now if 1 < p < 2 remark that R 2 2 can be written as 



22 



R 22 = E E^ 1 {^<log(n)V 7 ,}+ 1 Og(")^ 1 {^>log(n)y ; ,} 
;=/2+l fc=0 

6? jt <log(n)^} + (log(")^) 1_ ' vjkr ^j k >lo S (n)V jk } 



E 2 E l^l 2 "H^|"ll { ^ <log( „ )V + (log(n)^) 1 -P /2 (log(n)^)P/ 2 l { ^ >log(n) 
;'=;'2+l ^=0 



< 2 E eW^)^) 1 ^ 72 !^- 

By Lemma 5.1 < C 2 ^, and since / G ( A) if follows that there exists a constant C depending only 
on p, q, s, A such that I /V | p < C2^' S *P, which implies that 

R22 < C(n/log(n))- 1+ ? /2 E 2^ 2l, - v P- s ». 

7=72+1 

Now in the dense case, one has that 2i/ — i/p — s*p < which implies that 



R 22 = O 



({n/ \og(n))- 1+ P /2 2!^ 2v - v P- s "P^ = O Un/ log(«))-s^+r) (5.27) 
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by using the definition of ]i- Hence combining (5.25), (5.26) and (5.27) it follows that in the dense case 
for 1 < p < oo 



R 2 = {in/ log(n))~s+2i7+T j . (5.28) 
Now consider the sparse case when v{2 — p) > ps* , and decompose R 2 = R 2 i + R22 with 

R 2i = E 2 L Vjk) ^d R 22 = £ 2 L ™™{F>%' Vjt), 

Ho k=0 i=h+ 1 k=0 

where / 2 = / 2 (w) is the integer such that T) 1 > (n/ log(n)) 2s *+ 21 ' > 2# _1 . Note that in the sparse case 
then necessarily 1 < p < 2, and as previously one can thus remark that R 2 j can be written as 

R21 = t L <iog(«)V +log(n)^l { ^ >log(w)V < 2 t 2! £(lo g (n)V jk )i-r/% k \V. 

;'=;'o fc=0 ;=; fc=0 

Now using again the fact that V jk < C^f (by Lemma 5.1), that \p jk \P < C2^' s > (since / E 

B s v q (A)), and by definition of / 2 and jo it follows that 



R21 < C(n/log(n))- 1+ P /2 - 2/o(2v-pv- P s*)^ = f ( n /i g( n ))-ra 



(5.29) 



Then, remark that by equation (5.16), R 2 2 < EyL 2 +i Ef^o* &jk - EyL; 2 +i 2_2;s " '• Hence, R 22 
( 2"% s * ) , and thus by definition of ; 2 



R22 = C Mn/log(n))~ 2^2Fj . (5.30) 
Finally combining (5.29) and (5.30) it follows that in the dense case 

R 2 = O f(n/log(n))~^£s^ . (5.31) 

Then, combining (5.21), (5.22), (5.23), (5.28) and (5.31) implies that R n = f(n/ log(n))"2iWFTJ in the 

dense case, and R„ = ^(n/ log(n)) -2s *+ 2i/ ^ in the sparse case. Using inequality (5.20) this completes 
the proof of Theorem 4.2. □ 
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